47 research outputs found
Penalized Orthogonal-Components Regression for Large p Small n Data
We propose a penalized orthogonal-components regression (POCRE) for large p
small n data. Orthogonal components are sequentially constructed to maximize,
upon standardization, their correlation to the response residuals. A new
penalization framework, implemented via empirical Bayes thresholding, is
presented to effectively identify sparse predictors of each component. POCRE is
computationally efficient owing to its sequential construction of leading
sparse principal components. In addition, such construction offers other
properties such as grouping highly correlated predictors and allowing for
collinear or nearly collinear predictors. With multivariate responses, POCRE
can construct common components and thus build up latent-variable models for
large p small n data.Comment: 12 page
Case-control genome-wide association study of rheumatoid arthritis from Genetic Analysis Workshop 16 using penalized orthogonal-components regression-linear discriminant analysis
Currently, genome-wide association studies (GWAS) are conducted by collecting a massive number of SNPs (i.e., large p) for a relatively small number of individuals (i.e., small n) and associations are made between clinical phenotypes and genetic variation one single-nucleotide polymorphism (SNP) at a time. Univariate association approaches like this ignore the linkage disequilibrium between SNPs in regions of low recombination. This results in a low reliability of candidate gene identification. Here we propose to improve the case-control GWAS approach by implementing linear discriminant analysis (LDA) through a penalized orthogonal-components regression (POCRE), a newly developed variable selection method for large p small n data. The proposed POCRE-LDA method was applied to the Genetic Analysis Workshop 16 case-control data for rheumatoid arthritis (RA). In addition to the two regions on chromosomes 6 and 9 previously associated with RA by GWAS, we identified SNPs on chromosomes 10 and 18 as potential candidates for further investigation
Long-Term Outcomes of Three-Dimensional High-Dose-Rate Brachytherapy for Locally Recurrent Early T-Stage Nasopharyngeal Carcinoma
Background: Brachytherapy (BT) is one of the techniques available for retreatment of patients with locally recurrent nasopharyng eal carcinoma (rNPC). In this study, we evaluated the treatment outcome and late toxicities of three-dimensional high-dose-rate brachytherapy (3D-HDR-BT) for patients with locally rNPC.Materials and Methods: This is a retrospective study involving 36 patients with histologically confirmed rNPC from 2004 to 2011. Of the 36 patients, 17 underwent combined-modality treatment (CMT) consisting of external beam radiotherapy (EBRT) followed by 3D-HDR-BT, while the other 19 underwent 3D-HDR-BT alone. The median dose of EBRT for the CMT group was 60 (range, 50–66) Gy, with an additional median dose of BT of 16 (range, 9–20) Gy. The median dose for the 3D-HDR-BT group was 32 (range, 20–36) Gy. The measured treatment outcomes were the 5- and 10-year locoregional recurrence-free survival (LRFS), disease-free survival (DFS), overall survival (OS), and late toxicities.Results: The median age at recurrence was 44.5 years. The median follow-up period was 70 (range, 6–142) months. The 5-year LRFS, DFS, and OS for the entire patient group were 75.4, 55.6, and 74.3%, respectively, while the 10-year LRFS, DFS, and OS for the entire patient group were 75.4, 44.2, and 53.7%, respectively. The 10-year LRFS in the CMT group was higher than that in the 3D-HDR-BT-alone group (93.8 vs. 58.8%, HR: 7.595, 95%CI: 1.233–61.826, p = 0.025). No grade 4 late radiotherapy-induced toxicities were observed.Conclusions: 3D-HDR-BT achieves favorable clinical outcomes with mild late toxicity in patients with locally rNPC
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
Comparison of normalization methods across conditions. Boxplots show the differences in the coefficient of variation across flies in each genotype/sex/environment condition. (PDF 245Â kb
Recommended from our members
Fast Ionic Diffusion-Enabled Nanoflake Electrode by Spontaneous Electrochemical Pre-Intercalation for High-Performance Supercapacitor
Layered intercalation compounds NaMnO (x = 0.7 and 0.91) nanoflakes have been prepared directly through wet electrochemical process with Na ions intercalated into MnO interlayers spontaneously. The as-prepared NaMnO nanoflake based supercapacitors exhibit faster ionic diffusion with enhanced redox peaks, tenfold-higher energy densities up to 110 Wh·kg and higher capacitances over 1000 F·g in aqueous sodium system compared with traditional MnO supercapacitors. Due to the free-standing electrode structure and suitable crystal structure, NaMnO nanoflake electrodes also maintain outstanding electrochemical stability with capacitance retention up to 99.9% after 1000 cycles. Besides, pre-intercalation effect is further studied to explain this enhanced electrochemical performance. This study indicates that the suitable pre-intercalation is effective to improve the diffusion of electrolyte cations and other electrochemical performance for layered oxides, and suggests that the as-obtained nanoflakes are promising materials to achieve the hybridization of both high energy and power density for advanced supercapacitors.Chemistry and Chemical Biolog
Genome-wide association analysis of GAW17 data using an empirical Bayes variable selection
Next-generation sequencing technologies enable us to explore rare functional variants. However, most current statistical techniques are too underpowered to capture signals of rare variants in genome-wide association studies. We propose a supervised coalescing of single-nucleotide polymorphisms to obtain gene-based markers that can stably reveal possible genetic effects related to rare alleles. We use a newly developed empirical Bayes variable selection algorithm to identify associations between studied traits and genetic markers. Using our novel method, we analyzed the three continuous phenotypes in the GAW17 data set across 200 replicates, with intriguing results
Genome-wide case-control study in GAW17 using coalesced rare variants
Genome-wide association studies have successfully identified numerous loci at which common variants influence disease risks or quantitative traits of interest. Despite these successes, the variants identified by these studies have generally explained only a small fraction of the variations in the phenotype. One explanation may be that many rare variants that are not included in the common genotyping platforms may contribute substantially to the genetic variations of the diseases. Next-generation sequencing, which would better allow for the analysis of rare variants, is now becoming available and affordable; however, the presence of a large number of rare variants challenges the statistical endeavor to stably identify these disease-causing genetic variants. We conduct a genome-wide association study of Genetic Analysis Workshop 17 case-control data produced by the next-generation sequencing technique and propose that collapsing rare variants within each genetic region through a supervised dimension reduction algorithm leads to several macrovariants constructed for rare variants within each genetic region. A simultaneous association of the phenotype to all common variants and macrovariants is undertaken using a linear discriminant analysis using the penalized orthogonal-components regression algorithm. The results suggest that the proposed analysis strategy shows promise but needs further development
Supervised dimension reduction for high-dimensional generalized linear models
Dimensionality reduction has become an increasingly important strategy in highdimensional data analysis in modern statistics. This is largely driven by the need to analyze massive data sets involving ill-posed problems due to high dimensionality and multicollinearity issues. In this thesis, we propose two new regression-based modeling methods for high-dimensional classication problems by implementing dimension reduction idea. In order to deal with the generalized linear model (GLM) with high-dimensional data, we propose a strategy to implement the supervised dimension reduction idea in partial least squares (PLS) to t high-dimensional GLMs. We intend to build up generalized orthogonal-components regression (GOCRE) for GLMs. Unlike the existing methods based on the extension of PLS to categorical data, we sequentially construct orthogonal predictors and each orthogonal predictor is the resultant of convergence construction. The bias correction procedure by Firth (1993) is also applied. In order to simultaneously implement dimension reduction and variable selection ideas in high-dimensional data analysis, we develop Sparse-GOCRE by incorporating a penalized approach into GOCRE framework. Within the sequential construction of components in the framework of GOCRE, a penalized approach is used to identify the sparse predictors for each component. Two dierent penalized strategies are considered, i.e., L1 penalty and empirical Bayes thresholding strategy. Our methods not only provide a solution to the high dimensionality issue but are also able to identify the variables that are highly correlated or share some common coherent patterns. Both simulation studies and real data analysis of gene expression microarray data are presented to illustrate the competitive performance of our methods in comparison with several existing methods